Cryo-EM reveals an entangled kinetic trap in the folding of a catalytic RNA

Functional RNAs fold through complex pathways that can contain misfolded “kinetic traps.” A complete model of RNA folding requires understanding the formation of these misfolded states, but they are difficult to characterize because of their transient and potentially conformationally dynamic nature. We used cryo–electron microscopy (cryo-EM) to visualize a long-lived misfolded state in the folding pathway of the Tetrahymena thermophila group I intron, a paradigmatic RNA structure-function model system. The structure revealed how this state forms native-like secondary structure and tertiary contacts but contains two incorrectly crossed strands, consistent with a previous model. This incorrect topology mispositions a critical catalytic domain and cannot be resolved locally as extensive refolding is required. This work provides a structural framework for interpreting decades of biochemical and functional studies and demonstrates the power of cryo-EM for the exploration of RNA folding pathways.


INTRODUCTION
To function, RNAs must find their native three-dimensional (3D) fold among a multitude of alternative conformations in a biologically relevant time scale (1,2). This process is not trivial; the high stability and promiscuity of base-base interactions create an intrinsic thermodynamic propensity to form non-native contacts, potentially trapping the RNA in stable misfolded states and slowing down the folding process (1)(2)(3)(4). So-called "kinetic traps" have been detected in the folding pathways of several model RNAs and are more likely in large RNAs with intricate 3D folds (5)(6)(7). In vivo, kinetic traps may be resolved by the action of chaperones that actively misfold the RNA and allow it to refold and/or by the binding of proteins that bias the folding pathway toward the functional fold (8)(9)(10). Understanding the principles governing formation and resolution of misfolded states is thus a critical step toward a complete model of RNA folding. However, direct visualization of these states has not been generally possible by crystallography because they may be transient and/or conformationally dynamic. Dynamic states may be observed using nuclear magnetic resonance, but detailed structural information is limited by the size of the RNA. Given the latest advances in the ability of cryo-electron microscopy (cryo-EM) to solve dynamic RNA-only structures (11)(12)(13)(14), we reasoned that this technique may offer a way to directly observe the 3D structure of RNA folding intermediates, including misfolded states, which have been elusive to structural biology.
The self-splicing Tetrahymena thermophila group I intron, the first catalytic RNA found, and its multiturnover ribozyme derivative (herein referred to as TET; Fig. 1A) are well-established model systems used for decades to dissect general principles of RNA folding and catalysis (15)(16)(17)(18). TET catalyzes the cleavage of a substrate strand using an exogenous guanosine nucleophile (Fig. 1B). This ~125-kDa ribozyme folds into a compact structure with an internal core of stacked helices stabilized by tertiary contacts between surrounding peripheral domains (Fig. 1C).
Several intermediates have been identified in the folding pathway of TET, including a long-lived misfolded state (referred to as "M"), which can be experimentally accumulated (19)(20)(21)(22)(23). At standard in vitro conditions, ~10% of molecules fold directly to the native state (N) and ~90% to M (Fig. 1D). Refolding from M to N is very slow, in the time scale of hours, suggesting that this process requires considerable structural reorganization (22,23). Consistent with this, solution conditions and mutations that destabilize RNA tertiary structure accelerate refolding from M to N (23). Mutagenesis of the ribozyme core demonstrated that formation of non-native base pairs leads to M; thus, it was initially hypothesized that M contains non-native secondary structure elements near the catalytic center (22,24,25). Paradoxically, hydroxyl radical and dimethyl sulfate footprints of M and N show only minor differences localized mostly to the functionally important P7 helix (Fig. 1, A and C), suggesting that M and N essentially form the same secondary structure and tertiary contacts (23).
To explain the paradoxical nature of M, it was hypothesized that, although M and N form nearly identical structures, two singlestranded elements are crossed incorrectly in M, resulting in a nonnative trapped topology that requires extensive unfolding to be resolved (23). According to this model, the alternative secondary structure supported by mutations biases the folding pathway toward M and traps the ribozyme in the incorrect topology, but it is later replaced by native-like contacts before forming M. Although this hypothesis is consistent with the biochemical and functional data, the 3D structure of the M state has remained elusive, and the topological isomer model has remained to be tested directly.
Recently, complete structures of apo and holo TET ribozymes in the N state were solved by cryo-EM (12,14). Concurrent with those studies, we applied cryo-EM to solve the structure of both the N and the M states of TET. By directly observing and comparing these two states, we rationalize decades of studies and paradoxical observations and demonstrate the power of cryo-EM to dissect RNA folding intermediates. refinements using the cryoSPARC software (26) produced maps consistent with the global structure of TET, but the density at the core was not well defined and suggested a mixture of states (fig. S1). Similarly, unsupervised 3D classification using the Relion software (27) revealed heterogeneity localized to the ribozyme core, but the resulting maps did not display sufficient resolution to unambiguously differentiate the core conformations ( fig. S2). To address this, we first performed 3D variability analysis, which uses probabilistic principal components analysis to fit a linear subspace describing variability in the particles (28), to generate multiple maps across the first principal component (movie S1 and fig. S3). These maps were then used as references for particle 3D classification and refined in cryo-SPARC ( fig. S3). Comparison of the refined maps revealed two conformational classes, with major differences localized to the core of the ribozyme ( Fig. 2A and fig. S3). Other volumes generated by particle classification could be assigned to one of the two conformational classes ( fig. S3). The best quality maps from each class were refined to 3.4-and 3.9-Å resolution ( Fig. 2B and fig. S4).
The first class (Fig. 2, A and B, yellow), which was refined to 3.4 Å, matches a published cryo-EM map of TET in the N state [map-in-map correlation coefficient (CC), 0.94], and the published structure docks well into the map without additional refinements (average CC mask , 0.77; fig. S5). The map model correlation per residue was essentially the same with the published map and with our map ( fig. S5). Therefore, this class represents a population of N in the sample, consistent with previous studies showing that a fraction of molecules fold directly to N under the experimental conditions ( Fig. 1D) (22,23). In contrast, the second class (Fig. 2, A and B, blue), which was refined to 3.9 Å, does not fit the structure of N and revealed major differences in the core near the functionally important P7 helix, which contains the conserved guanosine-binding site, indicating that this class likely corresponds to the M state.
As expected, most (~74%) particles were classified into the M state ( fig. S3B). However, the proportion of particles classified into M versus N does not match the previously determined partitioning of 90% into M and 10% into N (23). This is likely due to current limitations related to cryo-EM data acquisition and analysis that prevent particle classification from being quantitative. These limitations include the low signal-to-noise ratio of individual particle images and imperfect particle picking and junk removal.

3D structures reveal topological differences between the N and M states
The core of TET contains two sets of stacked helices: P4-P5-P6 and P3-P7-P8-P9 (Fig. 1A. gray box). J8/7 is an unpaired stretch of seven  nucleotides that links P7 to P8 and makes contacts with the P3 helix ( Fig. 1A). Comparison of N and M maps suggested that the peripheral domains and the P4-P5-P6 stack of the core were similar between the two conformations, although with minor differences discussed below. In contrast, the maps suggested that J8/7 extends in a different direction in each map (Fig. 2, A and B, red arrow) and that P7 is rotated in M relative to its position in N (Fig. 2, A and B, cyan asterisk). To learn more about these conformational differences, we fitted an atomic model into the density of M. As the resolution was not sufficiently high for manual fitting, we used autoDRRAFTER (13) for map-guided computational modeling of the core of M, keeping the peripheral elements, the P4-P5-P6 stack, and the secondary structure fixed during modeling ( Fig. 2C and fig. S6). The full structure of M was refined using Phenix and Coot modeling software ( Fig. 2D and fig. S6) (29,30). The final structure fits well into the map (CC mask , 0.74) without major steric clashes (clash score, 6.61). While the moderate resolution of the map in the core does not allow atomic-level precision, the strand topology and global structure of M and its comparison to the known structure of N are clear and provide important insights. The final model of M reveals a core topology that diverges from that of N (Fig. 2, C and D). Whereas in N, the single strand J8/7 is "on top" of J7/3 (which joins helices P7 and P3) as seen from the orientation shown in Fig. 2C, in M J8/7 passes "under" J7/3. Further, the position of J8/7 in M strongly suggests that it docks into the major groove of P3, largely differing from its position away from the major groove of P3 in N; however, the resolution of the structure of M in this area is not sufficiently high to unambiguously describe the specific interactions made between J8/7 and P3. The entanglement of J8/7 is accompanied by a ~90° rotation of P7, likely induced by the new position of J8/7 that constrains proper placement of P7 (Fig. 2D). In contrast, helix P3 is essentially unmoved relative to its native position (Fig. 2D). Consistent with previous observations, longrange tertiary contacts are maintained, although large rearrangements occur within the core. These observations are consistent with the topology isomer hypothesis discussed above (23,31), although the specific topological differences and the identities of the entangled strands differed from those predicted. Visual inspection shows that this topological error cannot be resolved locally and requires major unfolding of the ribozyme (vide infra), consistent with the hours-long time scale at which the M to N transition occurs.
The structure of M is consistent with previous studies showing a strong link between the formation of an alternative P3 (alt-P3) secondary structure and formation of M (24,25). In alt-P3, nucleotides 303 to 306 in J8/7 are proposed to base pair with nucleotides 271 to 274, effectively preventing formation of the long-range native P3 (Fig. 1A). On the basis of the nearly identical chemical probing footprints of M and N, it was later proposed that formation of alt-P3 traps the ribozyme in the incorrect topology, but alt-P3 is later replaced by the native P3 (23). The map of M, although not allowing unambiguous positioning of every nucleotide, supports this model. In M, J8/7 docks into the major groove of P3 and is near nucleotides 271 to 274, where it could readily pair to form alt-P3 (Fig. 3A). Thus, only local rearrangements are required to transition from an intermediate state with alt-P3 to the M state. Although we cannot make any conclusions about the conformation of these elements before M is formed, the structure suggests how formation of alt-P3 could position J8/7 in its final trapped position in M.
The M-state structure explains its decreased catalytic activity Given the similarity of the structures of N and M suggested by chemical probing, the difference in their catalytic activity was puzzling, but the 3D structure of M now resolves this paradox. As shown in a simplified scheme of the TET catalytic cycle (Fig. 1B), the P1 duplexformed by base pairing of the substrate strand to the internal guide sequence-docks into the core, positioning the 5′ splice site in proximity to the bound guanosine nucleophile. Structures of N with and without bound substrates showed that the core is globally preorganized for P1 docking (Fig. 3B) (12). This is supported by solution small-angle x-ray scattering (SAXS) data, which are consistent with no major changes in the overall structure of the ribozyme upon substrate binding (32). In M, the P1 docking site is partially occupied by P7, which is rotated ~90° and moved by several angstroms relative to N (Fig. 3C). Thus, the rotated P7 effectively blocks P1 docking, which is an essential step early in the catalytic cycle. Although particle variability analysis (movie S1) and the lower local resolution of P7 in M ( fig.  S7) suggest that P7 is conformationally dynamic, this helix is unable to rotate to its native position without major rearrangements due to constraints imposed by the connecting strands. It was previously hypothesized that the inactivity of M was caused by rearrangements preventing P1 docking on the basis of a cleavage assay of a 3′ splice site mimic oligo that does not require P1 docking (23). The 3D structure of M strongly supports this hypothesis. Moreover, in the context of the complete group I intron and on the basis of a structure of TET mimicking the second step of splicing (12), the position of P7 would also interfere with the formation of P10 before the second transesterification reaction, making M incompetent for self-splicing. Last, as the guanosine-binding site is contained within P7, the large rotation and displacement of this helix would cause the bound guanosine to be in a different position relative to N, even if P1 was able to dock. Overall, the alternative position of P7, which is coupled to the entangled J8/7, explains the functional differences between N and M.

M contains small differences in the relative positions of the peripheral domains
Although large differences between M and N are observed at the core, their peripheral domains are very similar [root mean square deviation (RMSD) backbone , 2.43 Å; fig. S8]. M forms native long-range tertiary contacts consistent with previous functional studies of ribozyme mutants (23). However, there are small differences in the relative position of the peripheral domains, presumably arising from the distinct core conformations. To better visualize these differences, we superimposed the P4-P6 domains (nucleotides 107 to 258) of N and M, which are nearly identical in the two states (RMSD, 0.995 Å; fig. S8). With the P4-P6 domain superimposed, the backbone of other peripheral domains is misaligned by up to ~9 Å, most noticeably in P2 and P9 ( fig. S8). Previous SAXS studies suggested a 10% increase in the radius of gyration (R g ) of M versus N (32). We do not observe significant differences in compaction, and both structures have essentially the same R g of ~38 Å. However, the local resolution distributions are very different between N and M, suggesting differences in global flexibility ( fig. S7). In particular, P9, which is conformationally dynamic (12), appears to be much more flexible in M relative to N, as inferred by the local resolution of this domain ( fig. S7). Thus, differences in R g observed by SAXS may be reflective of differences in the direction and/or magnitude of these dynamics, the presence of other less-compact conformational species in the SAXS sample, or of limitations in the accuracy of R g estimates from SAXS profiles.

A model for disentangling the topological error
Resolution of the topological error in M cannot occur with only local rearrangements, raising the question of how much unfolding is needed to allow the transition from M to N. Although our data cannot establish the mechanism of this transition, comparing the structures of M and N in the context of previous functional and biochemical studies can provide insights. We generated a model for the transition of M to N that requires minimal secondary structure disruption and that is consistent with previous studies (movie S2). In this model, tertiary contacts between peripheral domains and the longrange base pairs that form P3 break to allow rotation of a long hairpin that includes P7, J8/7, J7/3, nucleotides 272 to 278 of P3, and P8 (movie S2). P7 rotates ~90° to reach its native position, while P8 undergoes an ~360° rotation that is allowed by the flexibility of single strands J8/7, J7/3, and nucleotides 272 to 278. After these rotations, refolding P3 and docking of the tertiary contacts results in formation of N (movie S2). This model shows that there are paths from M to N that do not require extensive breaking of secondary structure. This model is also in agreement with observations that destabilizing tertiary interactions between peripheral domains and/ or base pairs in P3 increases the rate constant for the transition from M to N (23,31). Further, because minimal disruption of secondary structure is required, the model is consistent with the ability of M to convert to N at physiological temperatures, albeit slowly.

DISCUSSION
RNA folding pathways are complex, with multiple branching points and intermediates. For decades, researchers have applied a wide array of diverse tools to detect folding intermediates and misfolded states in the folding of model RNAs. These techniques include quantitative thermodynamic and kinetic analysis, single-molecule imaging, x-ray scattering techniques, and electrophoretic mobility assays, among others (33). Pathways deduced by these techniques have proven invaluable to our understanding of RNA folding and can generate quantitative, testable predictions. What has been lacking is a 3D view of the conformational species populating the pathways. This information would provide a structural framework for interpreting decades of functional and biochemical studies and generate additional hypotheses, ultimately enhancing our predictive understanding of the folding process. The scarcity of 3D structural information is in part due to a lack of tools to readily explore the structures of these perhaps transient and/or conformationally dynamic states. Building on previous functional and biochemical studies and taking advantage of the recent advances in cryo-EM, we used the T. thermophila group I intron ribozyme as a model for RNA misfolding and solved the structure of a long-lived misfolded intermediate state that had remained mysterious for decades. In so doing, we demonstrate the power of cryo-EM as a tool to explore dynamic folding pathways of complex, functional RNAs. The term "misfolded RNA" might suggest the formation of nonnative secondary structures or global changes in tertiary structure. While these states can form, in the case of the long-lived misfolded state of the T. thermophila group I intron, referred to as the "M" state, misfolding is generated by a pair of incorrectly crossed unpaired strands that cause a topological error within a fold with native-like secondary structure and tertiary contacts. This topological error results in the large rotation of a functionally important domain, effectively preventing organization of the catalytic site and rendering the M state functionally incompetent. The question of how often topological errors arise within the folding pathways of other complex RNA folds is not clear, but the structure of the M-state model provides an example that will facilitate future explorations.
Given the inherent thermodynamic propensity of RNAs to misfold, it is remarkable that biology has found ways to produce functional RNAs in time scales that are consistent with life. In part, kinetic traps may be alleviated in vivo by the work of chaperones with helicase activity and/or the stepwise binding of proteins that guide the folding process. In addition, cotranscriptional folding may bias the folding pathway. A mechanistic understanding of these complex processes requires the understanding of the inherent folding properties of RNA in isolation. Experimental and computational advances have turned cryo-EM into a premier tool to solve the high-resolution native structures of biological macromolecules. Ongoing developments in time-resolved cryo-EM promise dissection of transient intermediate structures (34). Here, we demonstrate the application of conventional cryo-EM to understand the structure of a transient but longlived misfolded intermediate in a complex, folded RNA. This state was not directly observed for many years, in part because of limitations of other structural methods. Now, direct observation of the state provides strong evidence in support of previous models proposed on the basis of functional and biochemical data (23,31). The remarkable fact that these models were largely correct, even in the absence of a 3D structure, illustrates the power of rigorous quantitative biochemistry and biophysics to sort out complex problems in RNA structure and folding and the natural marriage between these methods and cryo-EM.

Preparation of folded TET RNA
We ordered a DNA fragment (gBlocks, Integrated DNA Technologies) containing the L-21 Sca I TET sequence flanked by the T7 RNA polymerase promoter sequence. The DNA fragment was polymerase chain reaction (PCR)-amplified, purified using the GeneJET PCR Amplification Kit (Thermo Fisher Scientific), and used as a template for in vitro transcription.
Before preparation of cryo-EM grids, the RNA was folded in 10 mM MgCl 2 and 50 mM Na-Mops (~30 mM Na + ) (pH 7.0) by incubating at 25°C for 15 min. These conditions accumulate the misfolded state of the ribozyme (23).

Data collection and analysis
Data were collected at the Pacific Northwest Center for Cryo-EM (PNCC) with a 300-kV Thermo Fisher Scientific Krios transmission electron microscopy in super resolution mode (pixel size, 0.5395 Å), equipped with a Falcon 3 direct electron detector and a Bioquantum K3 imaging filter, and using SerialEM data collection software. We collected 6222 movies (46 frames) with a total dose of 32 e/Å 2 and a defocus range of 0.8 to 2.0 m (table S1).
Data were processed using cryoSPARC (figs. S1 and S2) (26). Imported movies were subjected to patch motion correction and CTF (contrast transfer function) estimation with default parameters ( fig. S1). Micrographs were curated to eliminate those with damaged areas, excessive ice contamination, and/or poor CTF estimation, resulting in 5709 curated micrographs. Automated particle picking and extraction (extraction box size, 480 pixels; Fourier crop to box size, 256 pixels) resulted in 4,272,705 putative particles. To remove junk, three rounds of 2D classification were performed (number of classes, 200; circular mask diameter, 170 Å; final full iterations, 2; online-EM iterations, 40; batch size per class, 500). In the last two rounds, the initial classification uncertainty factor was set to 10 to obtain a higher diversity of good classes. A total of 803,573 particles remained after 2D classification.
The particles were used to build three ab initio models. Two of the models (88% of particles; fig. S1) displayed RNA features and global structures consistent with TET. The third map (12% of the particles; fig. S1) was used as a "sink" for removal of junk and suboptimal particles in three rounds of 3D classification (heterogeneous refinement), as done previously (11). A total of 650,948 particles remained after 3D classification (n 1 = 285,646; n 2 = 365,302; fig.  S2A). The two TET maps were refined using "homogeneous refinement" with default parameters. To resolve remaining conformational heterogeneity, the refined particles were subjected to 3D particle variability analysis (number of modes to solve, 2; filter resolution, 6 Å) (28) to generate four "frames" along the first principal component for each of the two original maps ( fig. S2A). Total particles (n = 650,948) were reclassified between the eight frames and refined (volumes 1 to 8; fig. S2A). The maps were inspected and classified into two major classes based on differences in core density ( fig. S2B). Maps with the highest resolutions (volumes 5 and 7; fig. S2A) were subjected to an additional (nonuniform) refinement. The final maps of N and M were refined to 3.4 Å (n N = 92,828 particles) and 3.9 Å (n M = 98,071 particles), respectively.

Structural modeling of M state
A schematic of the methods used for structural modeling is provided in fig. S5. The structure of the L-21 Sca I TET ribozyme [Protein Data Bank (PDB) ID, 7ez0], with Mg 2+ ions removed, was docked into the 3.9-Å resolution map of the M state, using UCSF Chimera ( fig. S5A) (35). The docked structure was then imported into Phenix (29) and a real-space refinement was performed (macro cycles, 1), constraining the secondary structure. Nucleotides corresponding to P7, P3, J8/7, J7/3, and connecting junctions were removed from the structure and were modeled using Rosetta RNA fragment assembly implemented in autoDRRAFTER ( fig. S5B) (13). The truncated structure (containing the peripheral domains and P4-P6, which fit well into the map) was kept constant during modeling and was provided as an input to autoDRRAFTER, along with the sequence and secondary structure of TET and the 3.9-Å map. The protocol described under "Manually setting up an autoDRRAFTER run" in the ROSIE web server was followed (36). Three rounds of autoDRRAFTER modeling were performed (cycles, 30,000; number of models, 2000). The modeled structures converged toward a common topology (fig. S5A). The top 10 models were inspected in the context of the map; all 10 models fit equally well, and one of them was chosen for further refinements. The model was imported into Phenix for a real-space refinement (macro cycles, 5) with secondary structure constraints. Coot (30) was used to make minor adjustments to the structure around J8/7, and a final real-space refinement was performed in Phenix (macro cycles, 5). Structures were visualized using PyMOL (Schrödinger Inc.) and UCSF ChimeraX (37). R g was calculated using CRYSOL (38). To generate the model for transition from M to N (movie S2), we manually modified structures using PyMOL to generate intermediate structures. The geometries of the intermediate structures were regularized using Phenix geometry minimization tools and molecular dynamics flexible fitting in Namdinator (39), using synthetic electron density maps generated in Chimera as inputs. The intermediate structures were interpolated using "morph conformations" (interpolation method: corkscrew; interpolation rate linear) in Chimera to generate movie S2.

SUPPLEMENTARY MATERIALS
Supplementary material for this article is available at https://science.org/doi/10.1126/ sciadv.abq4144 View/request a protocol for this paper from Bio-protocol.